NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Can LLMs Implicitly Learn Numeric Parameter Constraints in Data Science APIs?

Deng, Yinlin; Xia, Chunqiu Steven; Cao, Zhezhen; Li, Meiziniu; Zhang, Lingming (December 2024, NeurIPS 2024 (Curran Associates))

Full Text Available
Automated Program Repair via Conversation: Fixing 162 out of 337 Bugs for $0.42 Each using ChatGPT

https://doi.org/10.1145/3650212.3680323

Xia, Chunqiu Steven; Zhang, Lingming (September 2024, ACM)

Full Text Available
Benchmarking Automated Program Repair: An Extensive Study on Both Real-World and Artificial Bugs

https://doi.org/10.1145/3650212.3652140

Ouyang, Yicheng; Yang, Jun; Zhang, Lingming (September 2024, ACM)

Full Text Available
Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM

Xia, Chunqiu Steven; Deng, Yinlin; Zhang, Lingming (August 2024, OpenReview)

Full Text Available
SelfCodeAlign: Self-Alignment for Code Generation

Wei, Yuxiang; Cassano, Federico; Liu, Jiawei; Ding, Yifeng; Jain, Naman; Mueller, Zachary; de_Vries, Harm; von_Werra, Leandro; Guha, Arjun; Zhang, Lingming (December 2024, NeurIPS 2024)

Instruction tuning is a supervised fine-tuning approach that significantly improves the ability of large language models (LLMs) to follow human instructions. For programming tasks, most models are finetuned with costly human-annotated instruction-response pairs or those generated by large, proprietary LLMs, which may not be permitted. We propose SelfCodeAlign, the first fully transparent and permissive pipeline for self-aligning code LLMs without extensive human annotations or distillation. SelfCodeAlign employs the same base model for inference throughout the data generation process. It first extracts diverse coding concepts from high-quality seed snippets to generate new tasks. It then samples multiple responses per task, pairs each with test cases, and validates them in a sandbox environment. Finally, passing examples are selected for instruction tuning. In our primary experiments, we use SelfCodeAlign with CodeQwen1.5-7B to generate a dataset of 74k instruction-response pairs. Finetuning on this dataset leads to a model that achieves a 67.1 pass@1 on HumanEval+, surpassing CodeLlama-70B-Instruct despite being ten times smaller. Across all benchmarks, this finetuned model consistently outperforms the original version trained with OctoPack, the previous state-of-the-art method for instruction tuning without human annotations or distillation. Additionally, we show that SelfCodeAlign is effective across LLMs of various sizes, from 3B to 33B, and that the base models can benefit more from alignment with their own data distribution. We further validate each component’s effectiveness in our pipeline, showing that SelfCodeAlign outperforms both direct distillation from GPT-4o and leading GPT-3.5-based distillation methods, such as OSS-Instruct and Evol-Instruct. SelfCodeAlign has also led to the creation of StarCoder2-Instruct, the first fully transparent, permissively licensed, and self-aligned code LLM that achieves state-of-the-art coding performance. Overall, SelfCodeAlign shows for the first time that a strong instruction-tuned code LLM can result from self-alignment rather than distillation.
more » « less
Full Text Available
WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language Models

https://doi.org/10.1145/3689736

Yang, Chenyuan; Deng, Yinlin; Lu, Runyu; Yao, Jiayi; Liu, Jiawei; Jabbarvand, Reyhaneh; Zhang, Lingming (October 2024, Proceedings of the ACM on Programming Languages)

Compiler correctness is crucial, as miscompilation can falsify program behaviors, leading to serious consequences over the software supply chain. In the literature, fuzzing has been extensively studied to uncover compiler defects. However, compiler fuzzing remains challenging: Existing arts focus on black- and grey-box fuzzing, which generates test programs without sufficient understanding of internal compiler behaviors. As such, they often fail to construct test programs to exercise intricate optimizations. Meanwhile, traditional white-box techniques, such as symbolic execution, are computationally inapplicable to the giant codebase of compiler systems. Recent advances demonstrate that Large Language Models (LLMs) excel in code generation/understanding tasks and even have achieved state-of-the-art performance in black-box fuzzing. Nonetheless, guiding LLMs with compiler source-code information remains a missing piece of research in compiler testing. To this end, we propose WhiteFox, the first white-box compiler fuzzer using LLMs with source-code information to test compiler optimization, with a spotlight on detecting deep logic bugs in the emerging deep learning (DL) compilers. WhiteFox adopts a multi-agent framework: (i) an LLM-based analysis agent examines the low-level optimization source code and produces requirements on the high-level test programs that can trigger the optimization; (ii) an LLM-based generation agent produces test programs based on the summarized requirements. Additionally, optimization-triggering tests are also used as feedback to further enhance the test generation prompt on the fly. Our evaluation on the three most popular DL compilers (i.e., PyTorch Inductor, TensorFlow-XLA, and TensorFlow Lite) shows that WhiteFox can generate high-quality test programs to exercise deep optimizations requiring intricate conditions, practicing up to 8 times more optimizations than state-of-the-art fuzzers. To date, WhiteFox has found in total 101 bugs for the compilers under test, with 92 confirmed as previously unknown and 70 already fixed. Notably, WhiteFox has been recently acknowledged by the PyTorch team, and is in the process of being incorporated into its development workflow. Finally, beyond DL compilers, WhiteFox can also be adapted for compilers in different domains, such as LLVM, where WhiteFox has already found multiple bugs.
more » « less
Full Text Available
SelfCodeAlign: Self-Alignment for Code Generation

Wei, Yuxiang; Cassano, Federico; Liu, Jiawei; Ding, Yifeng; Jain, Naman; Mueller, Zachary; Vries, Harm de; Werra, Leandro Von; Guha, Arjun; Zhang, Lingming (December 2024, NeurIPS 2024 (Curran Associates))

Full Text Available
Evaluating Language Models for Efficient Code Generation

Liu, Jiawei; Xie, Songrun; Wang, Junhao; Wei, Yuxiang; Ding, Yifeng; Zhang, Lingming (August 2024, OpenReview)

Full Text Available
Magicoder: Empowering Code Generation with OSS-Instruct

Wei, Yuxiang; Wang, Zhe; Liu, Jiawei; Ding, Yifeng; Zhang, Lingming (July 2024, ACM)

Full Text Available
Fuzz4All: Universal Fuzzing with Large Language Models

https://doi.org/10.1145/3597503.3639121

Xia, Chunqiu Steven; Paltenghi, Matteo; Le Tian, Jia; Pradel, Michael; Zhang, Lingming (April 2024, ACM)

Full Text Available

« Prev Next »

Search for: All records